Usernetes Gen2: depends on Rootless Docker on hosts #287

AkihiroSuda · 2023-08-26T05:16:09Z

For

Rely on Rootless Docker/Podman/nerdctl for simplifying multi-host cluster setup (similar to kind and minikube KIC but for multi-host) #286

Usernetes (Gen2) deploys a Kubernetes cluster on Rootless Docker hosts.

Note

Usernetes (Gen2) has significantly diverged from the original Usernetes (Gen1),
which did not rely on Rootless Docker hosts.

See the gen1 branch for
the original Usernetes (Gen1).

Usernetes (Gen2) is similar to Rootless kind and Rootless minikube,
but Usernetes (Gen 2) supports creating a cluster with multiple hosts.

Components

Cluster configuration: kubeadm
CRI: containerd
OCI: runc
CNI: Flannel

Requirements

Rootless Docker
cgroup v2 delegation:

sudo mkdir -p /etc/systemd/system/[email protected]

cat <<EOF | sudo tee /etc/systemd/system/[email protected]/delegate.conf
[Service]
Delegate=cpu cpuset io memory pids
EOF

sudo systemctl daemon-reload

Kernel modules:

sudo modprobe vxlan

Using Ubuntu 22.04 hosts is recommended.

Usage

See make help.

# Bootstrap a cluster
make up
make kubeadm-init
make install-flannel

# Enable kubectl
make kubeconfig
export KUBECONFIG=$(pwd)/kubeconfig
kubectl get pods -A

# Multi-host
make join-command
scp join-command another-host:~/usernetes
ssh another-host make -C ~/usernetes up kubeadm-join

# Debug
make logs
make shell
make down-v
kubectl taint nodes --all node-role.kubernetes.io/control-plane-

vsoch · 2023-09-05T23:31:42Z

At least ~/.local/share/docker has to be a local ext4 or XFS filesystem.

Okay so sounds like I should try to remove the shared filesystem and get rootless working? Is there a way to change that path and I can throw it somewhere else?

AkihiroSuda · 2023-09-05T23:54:57Z

At least ~/.local/share/docker has to be a local ext4 or XFS filesystem.

Okay so sounds like I should try to remove the shared filesystem and get rootless working? Is there a way to change that path and I can throw it somewhere else?

I guess you can just make a symlink, or echo '{"data-root": "/somewhere"}' >~/.config/docker/daemon.json

vsoch · 2023-09-06T01:02:14Z

okay making progress! I make the nodes isolated for now - we can try the above later. I was able to get rootless docker installed and the control plane and nodes up - I'm trying to run the hack test now, and there is an error with the shell. When I control C:

$ kubectl  get pods
NAME            READY   STATUS    RESTARTS   AGE
dnstest-0       1/1     Running   0          117s
dnstest-1       1/1     Running   0          114s
dnstest-2       1/1     Running   0          111s
dnstest-shell   0/1     Eror     0          110s

It looks like the entrypoint is doing wget to the others, so I can try that manually. Ah, there is a timeout:

$ kubectl exec -it dnstest-1 bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Error from server: error dialing backend: dial tcp 10.10.0.5:10250: i/o timeout

Update: the same timeout happens with make shell. Is it just memory maybe? oom score?

Sep 06 00:54:01 u7s-usernetes-compute-001 kubelet[877]: I0906 00:54:01.910936     877 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="kube-system/coredns-5dd5756b68-6dljd" podStartSLOduration=19.910898197 podCreationTimestamp="2023-09-06 00:53:42 +0000 UTC" firstStartedPulling="0001-01-01 00:00:00 +0000 UTC" lastFinishedPulling="0001-01-01 00:00:00 +0000 UTC" observedRunningTime="2023-09-06 00:54:01.910498208 +0000 UTC m=+32.344325656" watchObservedRunningTime="2023-09-06 00:54:01.910898197 +0000 UTC m=+32.344725648"
Sep 06 00:58:30 u7s-usernetes-compute-001 kubelet[877]: E0906 00:58:30.319024     877 container_manager_linux.go:509] "Failed to ensure process in container with oom score" err="failed to apply oom score -999 to PID 877: write /proc/877/oom_score_adj: permission denied"

Maybe related? https://gitlab.freedesktop.org/dbus/dbus/-/issues/374

AkihiroSuda · 2023-09-06T02:02:10Z

Does the test work with a single-node mode? (with kubectl taint nodes --all node-role.kubernetes.io/control-plane-)
Does kubectl get nodes -o wide or kubectl describe nodes print some error?
10250/tcp might be blocked by firewall?

Is it just memory maybe? oom score?

Unlikely

Maybe related? https://gitlab.freedesktop.org/dbus/dbus/-/issues/374

Unlikely

vsoch · 2023-09-06T02:10:02Z

Working on this now - it looks like the firewall is OK:

Testing the others now.

AkihiroSuda · 2023-09-06T02:13:19Z

Also, please make sure "10.10.0.5" is the IP of the host (not the node container) that is reachable from other hosts.
If not, you may have to run make with HOST_IP=XXX.XXX.XXX.XX explicitly.

vsoch · 2023-09-06T02:23:27Z

This has happened twice now - it freezes on the worker node connecting:

[+] Running 1/0
 ✔ Container usernetes-node-1  Running                                                           0.0s 
docker compose exec -e U7S_HOST_IP=10.10.0.4 -e U7S_NODE_NAME=u7s-usernetes-compute-003 -e U7S_NODE_SUBNET=10.100.5.0/24 node kubeadm join 10.10.0.5:6443 --token w50k8z.cg55fshm4x9hmmrk --discovery-token-ca-cert-hash sha256:f3709024d7fb0f5ba150b05de6221bdfc6422fd524c593013154648c1d8418ad 
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: missing optional cgroups: hugetlb

I'm just going to control c and continue with one worker node for now.

vsoch · 2023-09-06T02:25:59Z

actually I take it back - it's not working on either worker node now. This step hangs:

$ make -C /opt/usernetes up kubeadm-join
make: Entering directory '/opt/usernetes'
./Makefile.d/check-preflight.sh
[WARNING] systemd lingering is not enabled. Run `sudo loginctl enable-linger $(whoami)` to enable it, otherwise Kubernetes will exit on logging out.
[WARNING] Kernel module "ip6_tables" does not seem loaded? (negligible if built-in to the kernel)
[WARNING] Kernel module "ip6table_nat" does not seem loaded? (negligible if built-in to the kernel)
[WARNING] Kernel module "iptable_nat" does not seem loaded? (negligible if built-in to the kernel)
docker compose up --build -d
[+] Building 0.5s (11/11) FINISHED                                                     docker:default
 => [node internal] load .dockerignore                                                           0.0s
 => => transferring context: 66B                                                                 0.0s
 => [node internal] load build definition from Dockerfile                                        0.0s
 => => transferring dockerfile: 994B                                                             0.0s
 => [node internal] load metadata for docker.io/kindest/node:v1.28.0                             0.3s
 => [node internal] load build context                                                           0.0s
 => => transferring context: 84B                                                                 0.0s
 => [node] https://github.com/containernetworking/plugins/releases/download/v1.3.0/cni-plugins-  0.2s
 => [node stage-3 1/4] FROM docker.io/kindest/node:v1.28.0@sha256:b7a4cad12c197af3ba43202d3efe0  0.0s
 => CACHED [node cni-plugins-amd64 1/1] ADD https://github.com/containernetworking/plugins/rele  0.0s
 => CACHED [node stage-3 2/4] RUN --mount=type=bind,from=cni-plugins,dst=/mnt/tmp   tar Cxzvf /  0.0s
 => CACHED [node stage-3 3/4] RUN apt-get update && apt-get install -y --no-install-recommends   0.0s
 => CACHED [node stage-3 4/4] ADD Dockerfile.d/u7s-entrypoint.sh /                               0.0s
 => [node] exporting to image                                                                    0.0s
 => => exporting layers                                                                          0.0s
 => => writing image sha256:e05649c01de33dd232081d438e377c437f5ce1b098ffa2ac648a1fd8f1a5824d     0.0s
 => => naming to docker.io/library/usernetes-node                                                0.0s
[+] Running 1/0
 ✔ Container usernetes-node-1  Running                                                           0.0s 
docker compose exec -e U7S_HOST_IP=10.10.0.3 -e U7S_NODE_NAME=u7s-usernetes-compute-002 -e U7S_NODE_SUBNET=10.100.153.0/24 node kubeadm join 10.10.0.5:6443 --token xxxxxxxxxxxxxxx --discovery-token-ca-cert-hash sha256:xxxxxxxxxxxxxx
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: missing optional cgroups: hugetlb

Even when I run the linger command and daemon-reload that message comes up.

AkihiroSuda · 2023-09-06T02:29:33Z

--token

The token value shouldn't be pasted publicly.
Probably safe, as long as you are using private IP addresses though.

This step hangs:

Seems a networking issue.

Is 10.10.0.5:6443 reachable from 10.10.0.3 (and 10.10.0.5 itself)?

vsoch · 2023-09-06T02:30:34Z

Weird, I'm getting the error earlier (I haven't joined the worker nodes yet, this is from the control plane)

Sep 06 02:26:44 u7s-usernetes-compute-001 kubelet[870]: E0906 02:26:44.421716     870 container_manager_linux.go:509] "Failed to ensure process in container with oom score" err="failed to apply oom score -999 to PID 870: write /proc/870/oom_score_adj: permission denied"

vsoch · 2023-09-06T02:33:59Z

I don't think so - this is from the same host:

$ ping usernetes-compute-001
PING usernetes-compute-001.c.llnl-flux.internal (10.10.0.5) 56(84) bytes of data.
64 bytes from usernetes-compute-001.c.llnl-flux.internal (10.10.0.5): icmp_seq=1 ttl=64 time=0.025 ms
64 bytes from usernetes-compute-001.c.llnl-flux.internal (10.10.0.5): icmp_seq=2 ttl=64 time=0.041 ms
64 bytes from usernetes-compute-001.c.llnl-flux.internal (10.10.0.5): icmp_seq=3 ttl=64 time=0.030 ms
^C
--- usernetes-compute-001.c.llnl-flux.internal ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2024ms
rtt min/avg/max/mdev = 0.025/0.032/0.041/0.006 ms
sochat1_llnl_gov@usernetes-compute-001:/opt/usernetes$ ping 10.10.0.5
PING 10.10.0.5 (10.10.0.5) 56(84) bytes of data.
64 bytes from 10.10.0.5: icmp_seq=1 ttl=64 time=0.035 ms
64 bytes from 10.10.0.5: icmp_seq=2 ttl=64 time=0.039 ms
^C
--- 10.10.0.5 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1008ms
rtt min/avg/max/mdev = 0.035/0.037/0.039/0.002 ms
sochat1_llnl_gov@usernetes-compute-001:/opt/usernetes$ ping 10.10.0.5:6443
ping: 10.10.0.5:6443: Name or service not known

And here is from 002:

$ ping usernetes-compute-001
PING usernetes-compute-001.c.llnl-flux.internal (10.10.0.5) 56(84) bytes of data.
64 bytes from usernetes-compute-001.c.llnl-flux.internal (10.10.0.5): icmp_seq=1 ttl=64 time=0.738 ms
64 bytes from usernetes-compute-001.c.llnl-flux.internal (10.10.0.5): icmp_seq=2 ttl=64 time=0.133 ms
^C
--- usernetes-compute-001.c.llnl-flux.internal ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.133/0.435/0.738/0.302 ms
sochat1_llnl_gov@usernetes-compute-002:~$ ping 10.10.0.5
PING 10.10.0.5 (10.10.0.5) 56(84) bytes of data.
64 bytes from 10.10.0.5: icmp_seq=1 ttl=64 time=0.717 ms
64 bytes from 10.10.0.5: icmp_seq=2 ttl=64 time=0.142 ms
^C
--- 10.10.0.5 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1013ms
rtt min/avg/max/mdev = 0.142/0.429/0.717/0.287 ms
sochat1_llnl_gov@usernetes-compute-002:~$ ping 10.10.0.5:6443
ping: 10.10.0.5:6443: Name or service not known

AkihiroSuda · 2023-09-06T02:36:26Z

ping: 10.10.0.5:6443: Name or service not known

ping cannot be used to ping TCP ports. curl -k https://10.10.0.5:6443 may suffice.
(If the server is functional, it prints "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",)

vsoch · 2023-09-06T02:37:37Z

No route to host. That's so weird, this just worked on the previous cluster I brought up (and no differences)

$ curl -k https://10.10.0.5:6443
curl: (7) Failed to connect to 10.10.0.5 port 6443 after 0 ms: No route to host

vsoch · 2023-09-06T02:46:37Z

I'm going to tear it down and bring up again from scratch.

vsoch · 2023-09-06T04:12:05Z

okay this time I am trying a larger node (just to sanity check) and the pre-flight check failed for the first node:

make: Entering directory '/opt/usernetes'
./Makefile.d/check-preflight.sh
[WARNING] systemd lingering is not enabled. Run `sudo loginctl enable-linger $(whoami)` to enable it, otherwise Kubernetes will exit on logging out.
[WARNING] Kernel module "ip6_tables" does not seem loaded? (negligible if built-in to the kernel)
[WARNING] Kernel module "ip6table_nat" does not seem loaded? (negligible if built-in to the kernel)
[WARNING] Kernel module "iptable_nat" does not seem loaded? (negligible if built-in to the kernel)
docker compose up --build -d
[+] Building 23.0s (11/11) FINISHED                                                   docker:rootless
 => [node internal] load build definition from Dockerfile                                        0.0s
 => => transferring dockerfile: 994B                                                             0.0s
 => [node internal] load .dockerignore                                                           0.0s
 => => transferring context: 66B                                                                 0.0s
 => [node internal] load metadata for docker.io/kindest/node:v1.28.0                             0.6s
 => [node] https://github.com/containernetworking/plugins/releases/download/v1.3.0/cni-plugins-  0.6s
 => [node stage-3 1/4] FROM docker.io/kindest/node:v1.28.0@sha256:b7a4cad12c197af3ba43202d3efe  15.7s
 => => resolve docker.io/kindest/node:v1.28.0@sha256:b7a4cad12c197af3ba43202d3efe03246b3f0793f1  0.0s
 => => sha256:b7a4cad12c197af3ba43202d3efe03246b3f0793f162afb40a33c923952d5b31 741B / 741B       0.0s
 => => sha256:9f3ff58f19dcf1a0611d11e8ac989fdb30a28f40f236f59f0bea31fb956ccf5c 743B / 743B       0.0s
 => => sha256:ad70201dab1369d251eeea8018a6e230a244e6ebd9cbd13599a1a9ac80d57bdb 1.94kB / 1.94kB   0.0s
 => => sha256:f86a56ded609290d97bd193f9c72e4f270c9e852bddae68e772b37828e76a 123.82MB / 123.82MB  1.8s
 => => sha256:32e9990d17952234896c1113bf84009f6e553dde4de92d5c1539b50ab0adb 310.29MB / 310.29MB  4.6s
 => => extracting sha256:f86a56ded609290d97bd193f9c72e4f270c9e852bddae68e772b37828e76a3e5        2.3s
 => => extracting sha256:32e9990d17952234896c1113bf84009f6e553dde4de92d5c1539b50ab0adb4ec        2.6s
 => [node internal] load build context                                                           0.0s
 => => transferring context: 818B                                                                0.0s
 => [node cni-plugins-amd64 1/1] ADD https://github.com/containernetworking/plugins/releases/do  0.1s
 => [node stage-3 2/4] RUN --mount=type=bind,from=cni-plugins,dst=/mnt/tmp   tar Cxzvf /opt/cni  1.2s
 => [node stage-3 3/4] RUN apt-get update && apt-get install -y --no-install-recommends   gette  4.7s 
 => [node stage-3 4/4] ADD Dockerfile.d/u7s-entrypoint.sh /                                      0.0s 
 => [node] exporting to image                                                                    0.6s 
 => => exporting layers                                                                          0.6s
 => => writing image sha256:39fbc7ab2ae1c40a8028d40ac5dfcb8ce0c6ae99a13f984b10ae46a4b4002a11     0.0s
 => => naming to docker.io/library/usernetes-node                                                0.0s
[+] Running 5/5
 ✔ Network usernetes_default    Created                                                          0.1s 
 ✔ Volume "usernetes_node-var"  Created                                                          0.0s 
 ✔ Volume "usernetes_node-opt"  Created                                                          0.0s 
 ✔ Volume "usernetes_node-etc"  Created                                                          0.0s 
 ✔ Container usernetes-node-1   Started                                                          4.9s 
docker compose exec -e U7S_HOST_IP=10.10.0.4 -e U7S_NODE_NAME=u7s-usernetes-compute-002 -e U7S_NODE_SUBNET=10.100.153.0/24 node kubeadm join 10.10.0.3:6443 --token xxxxxxxxxxx --discovery-token-ca-cert-hash sha256:xxxxxxxxxxxxxxxx
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: missing optional cgroups: hugetlb
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR CRI]: container runtime is not running: output: time="2023-09-06T04:06:04Z" level=fatal msg="validate service connection: validate CRI v1 runtime API for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/containerd/containerd.sock: connect: no such file or directory\""
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
make: *** [Makefile:97: kubeadm-join] Error 1

And for the second node it's still hanging.

@AkihiroSuda I think you are 15 hours ahead of me, so 4pm my time is 7am your time, 5pm my time is 8am (start of the work day?) We were planning on doing a small hackathon this Friday to work on this - and I wanted to invite you / see if you are available? 7am is quite early, but if you are up around 8am I think I could work a bit later on Friday. I could potentially do an hour later, just need some notice for that!

High level - I'd like to get this terraform setup working, and consistently, so I can contribute it here. I am going to try one more thing tonight - bringing it totally down and up, and running the scripts interactively. If there is some subtle difference with a service not persisting in this automated mode, that might do it. I will update the thread here, and let me know if you might have some time on Friday so we can bring up this setup and get your eyes on it (I am likely missing something obvious and this is very likely the best means to finishing it up!)

vsoch · 2023-09-06T04:47:03Z

Reproduced in manual running mode, so unlikely to be the automation bit.

 ✔ Network usernetes_default    Created                                                                                                                0.1s 
 ✔ Volume "usernetes_node-opt"  Created                                                                                                                0.0s 
 ✔ Volume "usernetes_node-etc"  Created                                                                                                                0.0s 
 ✔ Volume "usernetes_node-var"  Created                                                                                                                0.0s 
 ✔ Container usernetes-node-1   Started                                                                                                                4.9s 
docker compose exec -e U7S_HOST_IP=10.10.0.4 -e U7S_NODE_NAME=u7s-usernetes-compute-002 -e U7S_NODE_SUBNET=10.100.153.0/24 node kubeadm join 10.10.0.3:6443 --token boydm6.lgdgji6o10zhcrww --discovery-token-ca-cert-hash sha256:60006cde0edda31f26cae0f2a80ef7fac7803d1121ab98678fa81edc220c212a 
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: missing optional cgroups: hugetlb
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR CRI]: container runtime is not running: output: time="2023-09-06T04:33:28Z" level=fatal msg="validate service connection: validate CRI v1 runtime API for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/containerd/containerd.sock: connect: no such file or directory\""
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
make: *** [Makefile:97: kubeadm-join] Error 1
make: Leaving directory '/opt/usernetes'
make: Entering directory '/opt/usernetes'
./Makefile.d/check-preflight.sh

I can confirm this works on the main control-plane

$ curl -k https://10.10.0.3:6443
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {},
  "code": 403
}

so it's definitely just not being able to reach that port.

vsoch · 2023-09-06T04:49:44Z

I'm going to try adding egress for that port. It doesn't make sense that it worked the first time, but it's worth a shot!

vsoch · 2023-09-06T05:32:58Z

Nice! So the nodes (one worker node) is coming up again. So I think it was egress, but I can't say why it worked the first time! There is still some flakiness with something related to the actual instance and cgroups, I've seen this a couple of times (usually just one node - it's like one of the nodes randomly starts and doesn't have support for the updated cgroups (and reports missing systemd).

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
systemd is already the newest version (249.11-0ubuntu3.9).
systemd set to manually installed.
uidmap is already the newest version (1:4.8.1-2ubuntu2.1).
The following packages were automatically installed and are no longer required:
  libntfs-3g89 libnuma1
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 33 not upgraded.
[Service]
Delegate=cpu cpuset io memory pids
cat: /sys/fs/cgroup/user.slice/user-501043911.slice/[email protected]/cgroup.controllers: No such file or directory
Failed to connect to bus: No such file or directory
[INFO] systemd not detected, dockerd-rootless.sh needs to be started manually:

I would suspect this is Google Cloud or terraform related, not usernetes, but I don't know. But now that I know the egress was an issue and we had issue with the ports for the test app, I'm going to blow it up again and expose more for egress. Will send an update!

vsoch · 2023-09-06T06:26:05Z

okay reproduced what I had earlier - it seems a bit flaky (not the usernetes, the terraform) but this did work a second time. The place we are at is that the nodes come up, but the test doesn't work.

$ cd /opt/usernetes/hack
./test-smoke.sh
[INFO] Waiting for nodes to be ready
node/u7s-usernetes-compute-001 condition met
node/u7s-usernetes-compute-002 condition met
node/u7s-usernetes-compute-003 condition met
[INFO] Creating StatefulSet "dnstest" and headless Service "dnstest"
service/dnstest created
statefulset.apps/dnstest created
[INFO] Waiting for 3 replicas to be ready
Waiting for 3 pods to be ready...
Waiting for 2 pods to be ready...
Waiting for 2 pods to be ready...
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
partitioned roll out complete: 3 new pods have been updated...
[INFO] Connecting to dnstest-{0,1,2}.dnstest.default.svc.cluster.local
If you don't see a command prompt, try pressing enter.

Let me know if you might be able to join Friday! If not we can keep going back and forth here. The next thing to figure out is why I can't shell / connect to a pod.

vsoch · 2023-09-06T06:26:57Z

Perhaps if there is some range of ips that need to be open for the pods I should try adding them to egress. Adding the entire range seemed to bork the fix for 6443.

[INFO] Connecting to dnstest-{0,1,2}.dnstest.default.svc.cluster.local
If you don't see a command prompt, try pressing enter.
warning: couldn't attach to pod/dnstest-shell, falling back to streaming logs: error dialing backend: dial tcp 10.10.0.5:10250: i/o timeout
pod "dnstest-shell" deleted
Error from server: Get "https://10.10.0.5:10250/containerLogs/default/dnstest-shell/dnstest-shell": dial tcp 10.10.0.5:10250: i/o timeout

vsoch · 2023-09-06T06:41:56Z

I'm off to bed - thanks for the help today @AkihiroSuda !

AkihiroSuda · 2023-09-06T09:36:04Z

We were planning on doing a small hackathon this Friday to work on this - and I wanted to invite you / see if you are available? 7am is quite early, but if you are up around 8am I think I could work a bit later on Friday. I could potentially do an hour later, just need some notice for that!

👍

Google Cloud

Looks like VXLAN doesn't seem to work with Google Cloud by default, although it works with AWS and Azure:

VXLAN doesn't seem to work on GCP (while works on AWS and Azure); probably related to MTU #300

Likely to be related to MTU.

vsoch · 2023-09-06T16:49:08Z

I'm going to ask if there are easy ways to get VXLAN working in GCP - ping @aojea. If not, I can prepare an equivalent setup on AWS. I have one for AWS with Flux, and I'd need to start that over to use a different ubuntu base, remove flux, etc. https://github.com/converged-computing/flux-terraform-ami.

aojea · 2023-09-06T21:31:35Z

vxlan works, if there is a mtu problem then it is most probably solved by reducing the mtu on the origin or increasing it in the network (VM) https://cloud.google.com/vpc/docs/mtu so the encapsulation goes through

AkihiroSuda · 2023-09-07T23:22:51Z

@vsoch Are you still planning something today? (8:22 AM Friday here)

vsoch · 2023-09-07T23:29:37Z

@AkihiroSuda my mistake in mixing up my reference days - it's still Thursday here! So our hackathon would be tomorrow at 3pm Mountain time in the US (it looks like that's about 21.5 hours from now). And we have two things we can look at - first is the usernetes setup here, and the second is an AWS equivalent I've started, although we are still in early steps (e.g., ensuring each node knows the hostname of the others).

AkihiroSuda · 2023-09-08T00:20:21Z

@AkihiroSuda my mistake in mixing up my reference days - it's still Thursday here! So our hackathon would be tomorrow at 3pm Mountain time in the US (it looks like that's about 21.5 hours from now). And we have two things we can look at - first is the usernetes setup here, and the second is an AWS equivalent I've started, although we are still in early steps (e.g., ensuring each node knows the hostname of the others).

Sorry, I’m not attending then, but happy to help your experiment with AWS

vsoch · 2023-09-08T00:27:05Z

no worries! I can give you an update then. I can tell you that I can't consistently get the GCP setup working, maybe because of networking stuffs. It worked once, but then not again, even when I upped the MTU. I'm hoping we just have more luck on AWS and can develop there - will give you an update!

vsoch · 2023-09-08T01:29:56Z

And @AkihiroSuda we will make sure to plan another one that is on our Thursday which I am realizing is your Friday morning next time. Apologies for the oversight!

AkihiroSuda force-pushed the g2 branch 5 times, most recently from f41e39a to d1f7359 Compare August 26, 2023 05:37

AkihiroSuda mentioned this pull request Aug 26, 2023

Question: support for multiple hosts? #281

Closed

AkihiroSuda changed the title ~~[WIP] Usernetes G2: depends on Rootless Docker on hosts~~ [WIP] Usernetes Gen2: depends on Rootless Docker on hosts Aug 26, 2023

AkihiroSuda force-pushed the g2 branch 7 times, most recently from 91f8940 to 64c0087 Compare August 26, 2023 08:07

AkihiroSuda changed the title ~~[WIP] Usernetes Gen2: depends on Rootless Docker on hosts~~ Usernetes Gen2: depends on Rootless Docker on hosts Aug 26, 2023

AkihiroSuda force-pushed the g2 branch 15 times, most recently from 8d429d8 to f0903ac Compare August 26, 2023 10:09

AkihiroSuda mentioned this pull request Sep 5, 2023

kube-apiserver port forwarding for "rootful" Docker seems broken, possibly due to ip addr add "${U7S_HOST_IP}" dev eth0 #297

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usernetes Gen2: depends on Rootless Docker on hosts #287

Usernetes Gen2: depends on Rootless Docker on hosts #287

AkihiroSuda commented Aug 26, 2023 •

edited

Loading

vsoch commented Sep 5, 2023 •

edited

Loading

AkihiroSuda commented Sep 5, 2023

vsoch commented Sep 6, 2023 •

edited

Loading

AkihiroSuda commented Sep 6, 2023 •

edited

Loading

vsoch commented Sep 6, 2023

AkihiroSuda commented Sep 6, 2023

vsoch commented Sep 6, 2023

vsoch commented Sep 6, 2023 •

edited

Loading

AkihiroSuda commented Sep 6, 2023

vsoch commented Sep 6, 2023

vsoch commented Sep 6, 2023

AkihiroSuda commented Sep 6, 2023

vsoch commented Sep 6, 2023

vsoch commented Sep 6, 2023

vsoch commented Sep 6, 2023

vsoch commented Sep 6, 2023

vsoch commented Sep 6, 2023

vsoch commented Sep 6, 2023 •

edited

Loading

vsoch commented Sep 6, 2023

vsoch commented Sep 6, 2023

vsoch commented Sep 6, 2023

AkihiroSuda commented Sep 6, 2023 •

edited

Loading

vsoch commented Sep 6, 2023

aojea commented Sep 6, 2023

AkihiroSuda commented Sep 7, 2023

vsoch commented Sep 7, 2023 •

edited

Loading

AkihiroSuda commented Sep 8, 2023

vsoch commented Sep 8, 2023

vsoch commented Sep 8, 2023

Usernetes Gen2: depends on Rootless Docker on hosts #287

Usernetes Gen2: depends on Rootless Docker on hosts #287

Conversation

AkihiroSuda commented Aug 26, 2023 • edited Loading

Components

Requirements

Usage

vsoch commented Sep 5, 2023 • edited Loading

AkihiroSuda commented Sep 5, 2023

vsoch commented Sep 6, 2023 • edited Loading

AkihiroSuda commented Sep 6, 2023 • edited Loading

vsoch commented Sep 6, 2023

AkihiroSuda commented Sep 6, 2023

vsoch commented Sep 6, 2023

vsoch commented Sep 6, 2023 • edited Loading

AkihiroSuda commented Sep 6, 2023

vsoch commented Sep 6, 2023

vsoch commented Sep 6, 2023

AkihiroSuda commented Sep 6, 2023

vsoch commented Sep 6, 2023

vsoch commented Sep 6, 2023

vsoch commented Sep 6, 2023

vsoch commented Sep 6, 2023

vsoch commented Sep 6, 2023

vsoch commented Sep 6, 2023 • edited Loading

vsoch commented Sep 6, 2023

vsoch commented Sep 6, 2023

vsoch commented Sep 6, 2023

AkihiroSuda commented Sep 6, 2023 • edited Loading

vsoch commented Sep 6, 2023

aojea commented Sep 6, 2023

AkihiroSuda commented Sep 7, 2023

vsoch commented Sep 7, 2023 • edited Loading

AkihiroSuda commented Sep 8, 2023

vsoch commented Sep 8, 2023

vsoch commented Sep 8, 2023

AkihiroSuda commented Aug 26, 2023 •

edited

Loading

vsoch commented Sep 5, 2023 •

edited

Loading

vsoch commented Sep 6, 2023 •

edited

Loading

AkihiroSuda commented Sep 6, 2023 •

edited

Loading

vsoch commented Sep 6, 2023 •

edited

Loading

vsoch commented Sep 6, 2023 •

edited

Loading

AkihiroSuda commented Sep 6, 2023 •

edited

Loading

vsoch commented Sep 7, 2023 •

edited

Loading